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Abstract 

Non-Intrusive Load Monitoring (NILM) is a technology offering meth¬ 
ods to identify appliances in homes based on their consumption characteris¬ 
tics and the total household demand. Recently, many different novel NILM 
approaches were introduced, tested on real-world data and evaluated with 
a common evaluation metric. However, the fair comparison between differ¬ 
ent NILM approaches even with the usage of the same evaluation metric is 
nearly impossible due to incomplete or missing problem definitions. Each 
NILM approach typically is evaluated under different test scenarios. Test 
results are thus influenced by the considered appliances, the number of 
used appliances, the device type representing the appliance and the pre¬ 
processing stages denoising the consumption data. This paper introduces 
a novel complexity measure of aggregated consumption data providing 
an assessment of the problem complexity affected by the used appliances, 
the appliance characteristics and the appliance usage over time. We test 
our load disaggregation complexity on different real-world datasets and 
with a state-of-the-art NILM approach. The introduced disaggregation 
complexity measure is able to classify the disaggregation problem based 
on the used appliance set and the considered measurement noise. 


1 Introduction 

The power draw of a household is composed by aggregated power profiles of 
each household appliance. By knowing the household power draw as well as 
the appliance characteristics, such as power consumption, it is possible to disag¬ 
gregate the household power draw into its used appliance components. NILM, 
also known as load disaggregation or non-intrusive appliance load monitoring^, 
was firstly introduced by Hart [1] in 1992 solving the problem to disaggregate 

^The terms NILM and load disaggregation are used in the same context and are replaceable 
throughout this paper 
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load profiles provided by different techniques and algorithms. The up to now 
proposed NILM algorithms cannot solve the problem of NILM in all its aspects. 
To be able to improve the state-of-the-art of load disaggregation approaches, 
it is necessary to compare different algorithms in a fair way. Unfortunately, a 
fair comparison between different algorithms is not possible due to the fact that 
recent approaches are highly dependent on different conditions and features such 
as: 

• sampling frequency of the household power draw, 

• number of observed appliances, 

• appliance types (e.g., on/off appliances, multi-state appliances), 

• the filtering approach applied to the household power draw (the power 
draw feed into the NILM algorithm is usually filtered and preprocessed 
before evaluations) and 

• set of used appliance features (e.g., steady state electrical characteristics, 
transient behavior, etc.). 

Therefore, a possible comparison between algorithms is possible as for example 
how many feature are used or on which sampling frequency is the algorithm able 
to work. But an algorithm comparison lacks of the ability to compare and to 
evaluate the results of the load disaggregator even if the same dataset was used. 
There is the need of a common quantitative merit for NILM which is algorithm 
independent and considers data assumption as well as data pre-processing. Thus, 
the problem itself has to be made comparable which is created by the used 
appliances in a house, their appliance characteristics and their usage over time. 
A possibility to make the load disaggregation problem comparable is to describe 
the complexity of the problem in which the problem can be seen as a simple time 
series. To describe the complexity of time series different complexity measure 
were proposed such as entropy-based complexity measures [2, 3, 4], used for 
different applications such as DNA [5, 6] sequences or EEG [7, 8, 9] signals. 
The problem of load disaggregation is hard due to the high variety of different 
appliances, their different ways to consume energy and their high time-variant 
behavior introduced by the appliance user. It is therefore necessary to involve 
appliances and their characteristics as well as the time dependent behavior into 
the evaluation of a possible complexity measure. 

In this paper, we propose an approach to make the disaggregation problem 
of aggregated power demands comparable by introducing two novel load disag¬ 
gregation complexity measures. To the best of our knowledge, this is the first 
approach summarizing the disaggregation problem as a complexity value created 
by statistical characteristics of the appliance set and the time series behavior. A 
similar approach concentrating the fundamental limits of NILM was introduced 
in [10]. In this paper the authors derive an upper bound on the probability to 
distinguish scenarios for an arbitrary NILM algorithm to guarantee on when 
NILM is impossible without using privacy ensuring approaches as presented in 
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[11]. The work in [10] differs from our approach as we try to make the problem of 
superimposed loads with the used appliance characteristics comparable between 
different used NILM algorithms. The two proposed disaggregation complexity 
merits are evaluated on real-world data and compared to the disaggregation 
result of a state-of-the-art NILM algorithm. 

The remainder of this paper is organized as follows: In Section 2 the dis¬ 
aggregation complexity of aggregated power draws and factors influencing this 
complexity are identified. With this knowledge an appliance set complexity and 
time series complexity is defined in Section 3. In Section 4 the used appliance 
datasets, the way to extract possible power states out of measurement data 
and a possible load disaggregation approach used for evaluations are defined, 
followed by Section 5 presenting three case studies reviewing the complexity 
measures according to their suitability and meaningfulness to describe the load 
disaggregation problem. The approach and the results are discussed in Section 
6. Section 7 concludes the paper. 

2 Complexity of the Power Draw makes Hard¬ 
ness for Disaggregation 

The problem of load disaggregation is to break down the household power draw 
P{t) to its power consumption components Pn{t)- This can be formulated as the 
superimposition of the appliance power profiles over time as 

P{t) =pi(t) -Pp 2 (i) H- 'rPN{t) for t G {1,T} (1) 

where N represents the number of used appliances. Each power profile has 
its own behavior to consume energy determined by the appliance power states 
(e.g.: on/off appliance, multi-state appliance) and the appliance usage (e.g.: 
fridge with periodic usage, TV with common usage times) over time. The task of 
a load disaggregator is it then to find the best combination of known appliance 
power profiles to minimize the error between the estimated power signal and 
the household power draw. The computational complexity theory can be used 
to describe the way and complexity to find the best solution. The theory of 
computational complexity is widely applied to quantify the difficulty or hardness 
of computational problems and state whether a (type of) problem is solvable 
at all and how the calculation time scales with the problem size. In that sense 
load disaggregation is shown to be NP hard by Hart [1]. By combining this 
knowledge the NILM process is sketched in Figure I. The input for the literal 
load disaggregation is the (households) power draw P{t) that is generated by 
the usage of devices. Characteristics of the single devices are known and used by 
to the load disaggregator. The device characteristics can be learned, be known 
a-priori or entered by expert knowledge. The device usage is the unknown part 
of the NILM process that generates the power draw which can be simple or 
hard to disaggregate, even for the same set of devices. Therefore, a complexity 
measure of load disaggregation has to be able to handle this circumstance as 
well as the problems stated in Figure 2. These problems include: 
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Figure 1; A performance metric for load disaggregation should only compare its 
input with its output. The complexity of the input, which is the power draw, 
must be assessed separately. 


1. The complexity of aggregated loads is increasing with increased number of 
appliances due to higher probability of ambiguous power draws. 

2. The higher the switching frequency (like in the case of periodic performing 
appliances such as a fridge), the more complex is a device set for a load 
disaggregation algorithm. 

3. Appliances with several operation states (i.e., multi-state appliances instead 
of simple on/off appliances) make a device set more complex for a load 
disaggregation algorithm. 

4. The higher the similarity between appliance features, the more complex 
is the problem. Similarity features are for example state power values or 
consumption shapes. 

5. Additional noise, unknown or not considered appliances interfere with 
the household power draw and increases the complexity of the problem 
because the presence of noise typically increases the number of possible 
interpretations of a power draw. 

The aim is now to define a complexity measure describing the load disaggregation 
problem by a comparable quantity without taking the used load disaggregator 
into account. The complexity measure should be independent from the used 
load disaggregation approach and describe the problem of aggregated power 
loads. The problem is defined by the input to the load disaggregator which 
is the household power draw and the set of appliances. The set of appliances 
is defined by the appliance consumption behavior in steady state. Therefore, 
appliances are described by power consumption states as for example an on/off 
appliances are described by two states representing an appliance to be on or off. 

As a first approach to describe the complexity of households power draws, the 
well-known Shannon Information could be considered. The Shannon Information 
or Entropy [12] is defined for either a stream of symbols or a source in general, 
if the symbol occurrence probabilities are known (or can be assumed). For a 
specific source it means the averaged information of all possible streams. It is 
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Figure 2: Overview of different scenarios and characteristics of aggregate power 
draws which are increasing the complexity of disaggregating the household power 
demand. 


developed for communication theory and is not directly part of computational 
complexity theory. In case of NILM the power draw can be interpreted as a 
stream of symbols. The set of possible symbols is then defined by the power 
values of single devices and their possible combinations, respectively. Entropy 
reflects the difficulties in NILM related to the number of involved appliances. 
Noise could be included in a continuous formulation of Shannon’s entropy, but 
the problems related to very similar or equal power values for different states 
are not reflected. Further there is the problem that the concept states about 
averages of all possibilities or so called typical sequences. So far it is unknown 
whether load profiles are typical in that sense . 

Another approach to describe complexity is to use the Kolmogorov complexity [12]. 
It is the idea to describe the complexity of a stream by the length of the shortest 
possible program that can generate this specific stream that is widely used in 
computer science. It is a theoretical concept but there is currently no general 
method to estimate it. It can be well approached in practice but it remains 
the uncertainty about existence of a shorter (undiscovered) solution. In this 
context the device usage in the NILM process is interpreted as a program that 
is producing the stream. The disaggregation algorithm would be somehow 
an ’’inverting” program. A periodic device profile is not very complex in this 
sense, still many NILM approaches have difficulties in its disaggregation. The 
average Kolmogorov complexity of all possible streams approaches the Shannon 
Information as shown by [12, 13]. The specification of load disaggregation 
problems requires a complexity measure that describes a specific power draw 
as the Kolmogorov complexity and is calculable such as the Shannon entropy. 
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Due to the fact that standard complexity theories as the Shannon entropy and 
the Kolmogorov complexity fails to entirely describe the load disaggregation 
problem, a new complexity measure has to be introduced which should fulfil the 
following requirements: 

1. Describes the load disaggregation problem and should not be dependent 
on the load disaggregation approach. 

2. Includes appliance descriptions as number of states and the similarities 
between appliances and states. 

3. Should be applicable to time series to describe the influence of appliance 
usage affecting the used NILM approach. 

4. Should be easy and understandable as standard complexity theories. 

5. Must not be a general complexity merit. It is a application dependent 
complexity measure to make load disaggregation problems comparable 
without considering the load disaggregator. 

3 Novel Complexity Measure for Load Disaggre¬ 
gation 

The proposed complexity measure should reflect the complexity of the load 
disaggregation problem and therefore, should make NILM problems comparable. 
In this work we follow the idea that each possible produced power value is 
a combination out of all possible power states of appliances. This results in 
the task for the load disaggregator to find the best matching combination of 
power states with the measured power values. The measured power value is 
influenced by noise and should be approximated as good as possible enabling 
the load disaggregator to decide which appliances are running. The main idea is 
to relate an observed power value to all possible power state combination under 
the influence of measurement noise. 

3.1 Appliance Set Complexity 

One of the major factors influencing the complexity of aggregated power profiles, 
is the set of possible power values. The more complex the appliance model and 
their operational states are, the more complex is the problem to disaggregate 
the power profiles. In general, the appliance set is composed by N different 
appliances. With the knowledge of the appliance set and power demands of each 
appliance, the first step is to compute the number of possible aggregated power 
values M. In case of only two state devices are 2^ combinations, generally are 

Zmax 

M = 2^=3^^---= (2) 

Z=2 
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different power values possible. N 2 is the number of appliances with two states, 
N 3 with three states and so forth. For the calculation of all possible aggregated 
power values Pi, repetitions of the same value are possible for instance if a water 
kettle and a coffee machine consume the same power. Exceptions are the OW 
power state (all off) and the all-on-state Pm which is the highest possible power 
value. The vector P is the set of all possible (aggregated) power values Pi for a 
set of appliances, where i is defined as i G [1,M]. 

In its simplest form a NILM device observes a power value and compares 
it to all possible values Pi given by the device set. As long as there is one 
single matching power value in the set the task is solved straight forward. The 
problem is harder if either are two or multiple matching values or if the value 
is not in the set at all. For the disaggregation complexity measure we reason 
that it should contain something like a multiplicity or occupation number of 
the possible power values to reflect multiple occurrence. The second case does 
not occur in ideal NILM problems but in reality it is likely that a measured 
power value does not match exactly to any of the M aggregated power values. 
Therefore, we propose to represent the possible power values by a distribution 
function instead of a single value. Following this, it is possible to estimate 
for a power value, which would not be in the discrete set, the probability for 
being caused by the respective state. This approach covers also uncertainties 
caused by adjacent power values which hardly can be distinguished, e.g. through 
insufficient measurement accuracy in the NILM device. A simple measure for 
the similarity of two distributions is the overlapping coefficient 

OVL(/i,/ 2)= /min(/i(a;),/2(a;))da: ( 3 ) 

J X 

which gives the intersection area of the two distribution curves fi and /2 as 
stated in [14]. 

For a load disaggregation complexity measure C we propose to estimate the 
similarity of one power value distribution to all the other possible aggregated 
power valued distributions. The possible power values are expected between 0 
and Pm- By use of the overlapping coefficient the disaggregation complexity 
measure for the power state Pk is defined as 

M 

Cfc =^OVL(/p„/pJ 

(4) 

M i-Pm ^ ^ 

= H / min(/p,(p),/p^.(p))dp . 

Ck is the disaggregation complexity of the power value Pk within the set of M 
power state combinations. The parameter k determines the chosen reference 
power state combination, where k G [1,M]. In case the exact distribution of 
the power values are not know it is reasonable to assume a normal-distributed 
probability density function (PDF) a). The mean value ^ = Pk represents 
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Figure 3: A sketch of the different PDFs for each power value produced by the 
combination of all available power demands of an appliance set. The appliance 
set consists of three on-off appliances with demands of 10, 20 and 35VF. 


the observed power value and a variance a expresses the measurement and 
model uncertainties. 

To evaluate the complexity of an appliance set, it is now possible to apply 
the introduced disaggregation complexity for each possible combined power 
value. This yields information which power values and therefore appliance state 
combinations are more complex than others. 

Figure 3 sketches an example how to estimate the disaggregation complex¬ 
ity. For a given set of three on-off devices with {10,20,35}W we estimate the 
complexity for the power value Pk of 30 that represents the case when device 
one and two are turned on. The set has M = 8 possible power values in total. 
Each power state is represented by the same normal distributed PDF. The final 
disaggregation complexity value is then the sum of all overlapping areas, like 
Ai, A 2 and A 3 shown in Figure 3. The introduced disaggregation complexity C 
can be interpreted as a similarity factor of power states in the appliance set. 

Accordingly, a disaggregation complexity C of 1 means that at least one 
solution or appliance state is equal to the wanted power value. But it can 
also mean that two power value distributions match with similarity 0.5. The 
disaggregation complexity C = 2 means that in the case of two appliances each 
of them has the same power demand. Exceptions are the all-off power state (OW) 
and the maximum power demand Pm- Through to bounds of the complexity 
computation by [0,Pm] these states show a value of C = 0.5. The values of C 
depend as well on the chosen variance a of the PDF. The higher the value of cr, 
the higher is the probability of intersections between power values. This means 
the higher cr the higher is the appliance set complexity. A whole appliance 
set is characterized by its power states complexity spectrum that shows the 
complexity value for each of the aggregated power state values. The power states 
complexity spectrum shows at which regions confusions of states and therefore 
wrong appliance detections are more likely. 









3.2 Time Series Complexity of Aggregated Power Profiles 

The introduced disaggregation complexity C considers the appliance set and 
its characteristics but does not refer to a specific aggregated power profile. 
Therefore, we introduce the time series disaggregation complexity Ctotai which is 
a weighted average of the complexities of the power values within a time series. 
It considers the appliance set implicitly through the disaggregation complexity. 
The usage of the different appliances is reflected by the power values in the 
profile. We define the time series disaggregation complexity of an aggregated 
power draw as 


T T M 

Ctotai — E^*=tEE0VL(/p„/pJ , (5) 

t=i t=i fc=i 

where T represents the number of observed power samples. This disaggregation 
complexity Ctotai describes the averaged complexity of observed power values 
within all possible appliance state combinations for the whole observation time. 
The same power profile can have a different total complexity with respect to 
the appliance set. That reflects exactly the difficulty in load disaggregation to 
distinguish between similar devices. Calculation of Ctotai requires knowledge of 
the respective appliance set, i.e., their number of states, the power values and 
their distribution (or reasonable assumptions about it). 

Application of the total complexity to a power profile that lacks any meta¬ 
data requires additional assumptions. Those must be mentioned to keep the 
complexity understandable. The first attempt therefore would be to analyze the 
power profile histogram to reason about the aggregated states and distributions. 
Distortion can occur if two values close to each other are considered as different 
states instead of including them in the same distribution. This approximately 
doubles the single value complexity and consequently leads to overestimation of 
the total profile complexity. For instance, small power values (which are hardly 
distinguishable from noise) can be smoothed away by a threshold to avoid many 
very similar states that cause a misleading high complexity number. 


4 Evaluation Settings 

4.1 Real World Dataset 

To test the disaggregation complexity metric on different test cases we performed 
our complexity study on three different datasets. The first choice is the open 
REDD dataset [15]. We choose three different houses from the dataset where 6 
appliances were selected according to its characteristics to affect the household 
power demand in a significant way [16]. Furthermore, we used the open dataset 
GREEND [17], which documents an appliance level measurement campaign in 
Austria and Italy. As for the REDD dataset we have chosen 3 houses with 
6 different appliance as representative for our evaluation. The ECO-Dataset 
[18] was also used which monitored electricity consumption and occupancy in 9 
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Swiss houses. 3 houses with 6 different appliances were chosen. In Table 1 the 
appliances for each house and dataset are listed. For our evaluation we have 
chosen the whole observation time for the REDD dataset and two week for the 
GREEND and ECO-dataset. This assumptions are valid through the whole 
paper if not mentioned in a different way. 


10 



Dataset House Appliance Type Detected Power (subme- Detected power (aggre- 
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Table 1: List of datasets (REDD, ECO-dataset, GREEND) with 6 chosen appliances and their appliance power states detected 
for submetered power draws and the aggregated power draw. 



4.2 Identification of Appliance Power States 

To be able to compute the two complexity measures, the set of occurred power 
states is necessary. If meta data provides this information, this data could be 
used, but for most cases and datasets this information is either not provided or 
not in the desired extent. Accordingly, the most obvious approach would be to 
use expert knowledge to identify the appliance states and their power demand. 
But this process is time consuming and erroneous. Therefore, an automatic state 
detection algorithm is presented, based on the approach published in [19]. It 
automatically detects the most common power states in any used power draw. 
In detail, the approach starts with de-noising the signal from high frequency 
signal by median filtering. The signal is sharpened to get sharp edges which is 
helpful and also necessary for edge detection. The edge detection detects based 
on a predefined threshold rising and falling edges in the signal. With the set of 
rising edges and the set of falling edges, edge pairs are identified. These detected 
edge pairs are then the basis for frequency counting of edges done be performing 
an histogram of occurred edge pairs. Based on the histogram, it is decided if an 
edge pair is representative for the considered observation window. More and 
detailed information about the detection process can be found in [19]. 

The detection approach can be applied on submetered measurement data as 
well as on aggregated power measurements. For both scenarios different outputs 
are produced in which the submetered measurements can produce multi-state 
power states of appliances. Similarities between appliances and their power 
states are possible. In contrast the aggregated power measurement data is 
producing a set of power states without any information of appliances and their 
number of states. It is only detecting different power states and not different 
appliances. Considering this input case, no similarities between appliances are 
possible. The algorithm tries to find a unique set of power states. However, we 
want to clarify that the use of this detection approach is not necessary for the 
calculation of the complexity values. The complexity values can be applied to 
any detection approach providing a set of appliances power states in which the 
appliances are described as on/off or multi-state appliances. 

In this work the proposed detection approach is applied to each house and 
dataset. The results are listed in Table 1. The parameters of the algorithm are 
set as in [19] as for example the power threshold for a valid power edge was 
25IT. 

4.3 Load Disaggregation Algorithms 

The proposed complexity values should describe the complexity of aggregated 
power loads. To get an idea how meaningful the proposed complexity approaches 
are, the results should be compared to the results of an appropriate and suitable 
load disaggregation approach. This comparison should give a quantitative 
feedback if the complexity value is meaningful according to the used load 
disaggregation approach. We claim that the load disaggregation approach needs 
to have the same inputs as described in Section 2 to be able to provide meaningful 
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results. 

Thus in the following, we review state-of-the-art load disaggregation algorithms 
which can be divided into supervised and unsupervised algorithms. Supervised 
NILM approaches are based on labelled data on which pattern recognition and 
optimization algorithms are applied [20, 21, 22, 23, 24, 25, 26, 27]. Labelled 
data and the corresponding need of a training phase is the main disadvantage of 
supervised algorithms, because of increased development costs and efforts. In 
contrast to supervised algorithms, unsupervised algorithms do not need labelled 
data as well as no training phases. Recent representatives for unsupervised 
algorithms are based on k-means clustering [28], on fractional hidden Markov 
models (FHMM) and its variants [29, 30, 31, 32, 33]. In this work, the Particle 
Filtering (PF) approach to perform load disaggregation is used [34]. The PF is 
beneficial for NILM because of its characteristics to handle non-linear problems 
within large state spaces that are suffering from non-Gaussian noise. Basically, 
the PF estimates the power states of appliances based on the approximated 
posterior density of power states estimated by a random set of particles. The 
appliance state space is composed by multiple independent Hidden Markov 
Models (HMM) representing on the one hand appliances such as on/off or 
multi-state appliances and on the other hand the aggregated power draw for 
a household power demand. The composition of HMMs leads to the use of a 
Factorial Hidden Markov Model (FHMM) which has the advantage to decrease 
the number of states compared to using one large standard HMM. As a final step, 
the work in [34] uses a decision maker based on thresholding to decide which 
appliance is working at which operating state with knowledge of the appliance 
HMM. The appliance model includes on the one hand the approximated power 
demand and on the other hand the general structure, such as how many states 
a device has. The PF is able to work with unknown and inaccurate transition 
matrix settings. The PF based approach is suitable for our evaluation due to 
the fact that the algorithm can handle a set of appliances modelled as on/off 
or multi-state appliances and is performing load disaggregation based on a set 
of power states and the aggregated power draw. For the evaluation the PF is 
parametrized as in [34] in which the number of used particles, as most important 
parameter, is set to 1000 particles. 

5 Case Study 

5.1 Appliance Set Complexity for Different Datasets and 
Different Sets of Power States 

As described in the previous sections, the appliance set complexity is aiming 
to describe the complexity of the used appliance set without considering the 
appliance usage over time. Therefore, the most relevant parameter are the used 
power values for each appliance power state. These power states are identified 
for each appliance using the algorithm presented in Section 4.2. The algorithm 
creates power states from measurement data in which we used aggregated 
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measurement data and submetered power readings. In the case of aggregated 
power readings the algorithm has no information about the number of appliances. 
Thus, it can distinguish between different power states but not between different 
appliances. Appliance with similar power consumption are handled as single 
on/off devices and multi-state behavior of appliances is not assigned to appliances. 
For the evaluation the aggregated power consumption is created by superimposing 
the appliance power draws from appliances chosen in Table 1. In contrast, we 


Dataset 

House 

submetered 

aggregated 

max 

mean 

max 

mean 

REDD 

1 

16.91 

7.88 

2.28 

1.48 

REDD 

2 

6.170 

2.62 

2.32 

1.33 

REDD 

3 

21.39 

8.69 

1.98 

1.32 

ECO 

1 

6.65 

2.88 

2.67 

1.36 

ECO 

2 

12.06 

4.75 

1.44 

1.04 

ECO 

3 

16.62 

6.53 

1.59 

1.15 

GREEND 

1 

18.20 

7.17 

2.01 

1.19 

GREEND 

2 

4.46 

2.18 

1.36 

1.07 

GREEND 

3 

48.36 

24.43 

1.87 

1.18 


Table 2: List of mean and maximum of the appliance set complexity for each 
house and dataset 


used submetered power readings to create multi-state appliances with appliance 
having similar power states With this consumption data of an appliance it 
is possible to identify appliance specific power states which can be combined 
to a multi-state appliance. We performed the power state detection for each 
device listed in Table I. The created set of appliances consists of on/off and 
of multi-state appliances in which the power states between appliances can be 
the same or differ only for some Watts or be completely different. The created 
appliances for the aggregated consumption data and for the submetered power 
readings are presented for each used dataset in Table I. 

In this case study the appliance set complexity is tested on the appliance set 
based on aggregated power readings and on submetered power readings. As input 
for the complexity computation a vector of all possible power state combinations 
of the appliance set is used. The results are presented in Table 2 using the 
mean and the maximum value of the appliance complexity. The complexity 
values for submetered data are higher and therefore more complex than for 
the aggregated power readings. As reason we claim that similarities between 
appliances are getting lost in the case of aggregated loads due to the inability 
to distinguish between appliances. With aggregated power readings it is only 
possible to distinguish between different power states. This also leads to the 
fact that the problem complexity for the same house of a dataset differs between 
appliance sets created by the aggregated or the submetered power data. This 
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strengthens the need of a complexity measure due to different preprocessing 
stages of power data. However, appliances produced by submetered data are 
affected by power state similarities and have therefore a higher appliance set 
complexity. We also provide Figure^ 4 presenting the appliance set complexity 
for each dataset over all possible power state combinations and is based on the 
appliance states produced by the submetered power readings. The plot shows for 
each possible power state combination the appliance set complexity. The color 
white means that the appliance set complexity is zero because this power value 
is not producible by a combination of saved power states for a certain dataset 
and house. The appliance set complexity starts from green (low complexity), 
blue (medium complexity) and ends at red (high complexity). The colors are 
normalized according to the dataset with the maximum occurred appliance set 
complexity. Figure 4 shows which dataset and house is more complex according 
to the used power states presented in Table 1. For example, house 2 of the 
GREEND dataset has a very low appliance set complexity while house 3 of the 
GREEND dataset has a very high and tight appliance set complexity. 


^For readability please consider coloured prints 
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5.2 Time Series Complexity for Different Datasets and 
Different Sets of Power States 

The appliance set complexity gives feedback about the complexity of the used 
appliances by comparing their power states and appliance structure. For the 
load disaggregation problem another important factor is the influence of the 
appliance usage over time. This considers how and when appliances are operated 
which could be for example user driven (e.g., coffee machine, TV) or periodically 
activated (e.g., fridge). The proposed time series complexity considers this 
circumstances in its computation. For the evaluation of this complexity measure 
the time series of all houses and datasets for an observation window of half day 
are considered. The input for the complexity computation are the measurement 
samples which are combinations of possible power states affected by noise. In 
contrast, the appliance set complexity considers power state combination without 
noise as input for the complexity computation. As appliance set the appliances 
based on aggregated and submetered power data are used. In Table 3 the mean 
and the maximum of the time series complexity for all houses and datasets are 
presented. Moreover, a time snippet of a time series of house 3 of the ECO 
dataset with corresponding complexity values for each measurement sample are 
presented in Figure^ 5. The colors white and green means low complexity, blue 
means medium complexity and red means high complexity. The colouring is 
normalized to maximum occurred complexity value for the considered observation 
time and measurement samples. Comparing the colormap with the time series 
shows that overlapping behavior results in an increased and high complexity 
value while high power values do not necessarily results in a high complexity. 


®For readability please consider coloured prints 
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Figure 5: Time snippet of the power readings for house 3 of the ECO dataset with a colormap of the complexity fo 
measurement sample. White to green means low complexity, blue means medium complexity and red means high comp 











































Dataset 

House 

submetered 

aggregated 

max 

mean 

max 

mean 

REDD 

1 

13.79 

1.04 

1.62 

0.50 

REDD 

2 

5.39 

0.54 

2.32 

0.11 

REDD 

3 

17.54 

1.07 

1.98 

0.35 

ECO 

1 

3.71 

0.95 

2.62 

0.15 

ECO 

2 

11.99 

2.86 

1.11 

0.19 

ECO 

3 

14.77 

4.91 

1.57 

0.41 

GREEND 

1 

7.77 

0.89 

1.06 

0.12 

GREEND 

2 

4.305 

0.91 

1.35 

0.50 

GREEND 

3 

45.01 

3.67 

1.81 

0.04 


Table 3: List of mean and maximum of the time series complexity for each house 
and dataset 


5.3 Load Disaggregation of Complexity Marked Power 
Readings 

In this case study the results of the complexity measures are compared with the 
results of an NILM approach on the same power data. The aim is not to evaluate 
the used disaggregation approach. This evaluation should give a feedback about 
the suitability and meaningfulness of the proposed complexity measures. As 
described in Section 4 we used the load disaggregation algorithm from [34] which 
is able to handle on/off and multi-state appliances. We used the appliance set 
and models identified by the submetered measurements from Table 4. 

We assume the availability of ground truth data for the evaluation as reason 
to use the submetered data and not the aggregated power readings. The 
appliance set detected in Table 4 compared to the listed ones in Table 1 are 
different because the appliance state identification algorithm from Section 4 was 
considering only the most common appliance power states. We defined power 
states as most common appliance power states if a detected power state occurred 
as often as 15% of the maximum occurred power state. We used power readings 
of a whole day to calculate the time-series complexity. The load disaggregation 
algorithm is evaluated according to the real and estimated energy per kWh on 
appliance level and to the aggregated power readings. The results for each house 
and dataset for all used appliances are shown in Table 5. 
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Dataset 

House 

Appliance States 

REDD 

1 

[1690 2455], [190] [210 410 880 1110], 
[60 1533], [260 710 1440] [2712] 

REDD 

2 

[770], [145], [410], [1875], [1050], 
[160] 

REDD 

3 

[120], [210] [2255], [130 1740], [960 
1290 1610], [360 900] 

ECO 

1 

[40], [780], [50 1205], [1795], [80], [90] 

ECO 

2 

[120 2060 2170], [70], [55 178], [50], 
[1845], [160] 

ECO 

3 

[100], [55 1085 1520], [130], [100], 
[120], [1330 1567] 

GREEND 

1 

[50 1270], [55 1840], [50 140], [40 
1900], [1790], [1220] 

GREEND 

2 

[80], [80 1730], [850], [90 160 1910], 
[1580], [60] 

GREEND 

3 

[60], [72 2020], [160 2415], [70], 
[1230], [1030] 


Table 4: Appliance set used by the load disaggregation approach. The appliances 
differ to the ones in Table 1 due to the fact that we considered for this case 
study only the most common power states per device 
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Less complex time series like in REDD house 2 are easier to disaggregate than 
more complex time series as in ECO house 2. A lower complexity is in general 
easier to disaggregate as a more complex time series. Similar power states as 
for example in house 1 and 2 in the ECO dataset are highly affecting the load 
disaggregation result. In the case of similar power states the algorithm is not able 
to distinguish between appliances with similar power states which is supporting 
the need of a common complexity measure for load disaggregation. By using 
a different power state identification setting also the appliance set complexity 
compared to the previous case studies is different. This also strengthens our 
assumption to have a complexity measure handling the set of appliance power 
states independent from the used load disaggregation algorithm. 

6 Discussion 

In the previous section different case studies were presented to evaluated use¬ 
fulness of the proposed complexity measures. For example in the case study 
for the appliance set complexity the complexity is highly dependent on the 
used appliance set. The number of devices several states and similar states 
between appliances are affecting the load disaggregation complexity strongly. 
The complexity is higher for more complex appliance sets and is not dependent 
on the used house or dataset. Thus, we claim that the preprocessing stage has 
an important effect on the problem complexity and accordingly also on the 
result of the used load disaggregation process. This fact is also valid for the 
time-series complexity. By using an appliance dataset with complex appliances 
and structures also the time-series complexity is affected strongly for the same 
house of a dataset. 

The time series complexity is highly affected by the appliance usage. For 
the evaluation of the case study an observation window of a day was used. We 
claim that even complex appliance sets as the house 3 of the GREEND dataset 
can have a low time series complexity due their appliance usage over time. Thus, 
the appliance set complexity and the time series complexity do not correlate 
between each other. For example a high appliance set complexity can lead to a 
low or a high time series complexity. We also show that the proposed complexity 
measures can classify the complexity of a load disaggregation problem but does 
not correlate to the used load disaggregation approach. The result of the load 
disaggregation approach cannot be estimated by our proposed approach in which 
the problem with its preprocessing stages is described and made comparable. 
However the case studies showed that the proposed complexity measures fulfil 
the requirements identified in Section 2. It uses a given appliances dataset to 
compute the appliance set complexity as well as the time series complexity. The 
complexity measures can be interpreted as a similarity measure between a set 
of possible states (appliance set complexity) and a similarity measure in which 
the set of possible states is compared with noisy observed data (time series 
complexity). The same principle for measuring complexity is applicable for other 
attributes. 
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7 Conclusion 


This paper defined two complexity measures for the problem of load disaggre¬ 
gation which deals with the task to break down the aggregated power draw of 
appliance to the appliance components. Appliance characteristics with smart 
algorithms are used to solve this task. One important aspect is the distinction 
between the disaggregation approach itself and the problem of aggregated power 
profiles. Beside clear performance measures for NILM algorithms it needs a 
clear definition to specify the hardness or complexity of a specific case. This 
makes a fair comparison of different NILM approaches according to the used 
load disaggregation problem possible. To overcome the lack to compare load 
disaggregation problems we introduced two novel complexity measures to assess 
the complexity of a load disaggregation problem based on the used appliance 
sets. With the proposed complexity measures the used appliance sets and the 
aggregated power readings are evaluated for their complexity. To evaluate how 
the disaggregation complexity measures are reflecting load disaggregation prob¬ 
lems in reality, we performed the complexity calculation and load disaggregation 
with a state-of-the-art NILM approach on different datasets and time-series. 
Our evaluations show that our disaggregation complexity measure is able to 
assess the hardness of an appliance dataset as well as a specific time series for a 
NILM algorithm. We want to emphasize that the presented complexities are 
relative and not absolute measures for the problem complexity. Thus, knowing 
the disaggregation complexity is not sufficient to determine the performance 
of the load disaggregator as the performance to disaggregate loads depends on. 
The presented measure gives meaningful results for load disaggregation problems 
with one feature such as the active power representing each power state of an 
appliance in which future work will deal with several features as active and 
reactive power. 
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